Back

Mobile DNA

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Mobile DNA's content profile, based on 27 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Recursive Repeat Extender (RRE): A recursive approach to automatically extend repeat element models

Falcon, F.; Tanaka, E. M.; Rodriguez-Terrones, D.

2026-04-17 bioinformatics 10.64898/2026.04.14.718546 medRxiv
Top 0.1%
32.0%
Show abstract

Repetitive elements, including transposable elements (TEs), are integral structural components of eukaryotic genomes; consequently, their identification and classification are crucial to their study. Several approaches have been developed to perform de novo genome-wide repeat identification through pairwise sequence comparisons; however, they often generate truncated repeat models due to their sampling strategies and the substantial fragmentation of many of the older repeat copies in the genome. To improve repeat models generated de novo, several algorithms have been developed that increase model length via the BEEA (BLAST-Extend-Extract-Align) approach, in which genomic instances of each repeat are identified with BLAST, their coordinates are extended, and a refined model is generated by aligning the extended sequences. Nevertheless, these extension algorithms exhibit two key limitations that hinder the reconstruction of highly degenerate and fragmented repeats: the use of BLAST as a search algorithm - which limits their sensitivity in detecting highly diverged sequences - and the use of a single search step, which precludes the reconstruction of extensively fragmented repeat models. In this work, we present a novel approach to extend repeat models, called RRE (Recursive Repeat Extender), which uses profile hidden Markov models (HMMs) to search for repeat elements with high sensitivity and employs a recursive extension strategy that iteratively searches and extends the repeat model, using the extended model from each round as input for the next and continuing until no additional sequence can be incorporated. We apply RRE to repeat libraries generated de novo from five model organisms, and our results show that RRE-generated repeat libraries contain fewer but longer repeat models and can identify a larger proportion of the genomes as repetitive than RepeatModeler2-generated repeat libraries. Notably, RRE can reconstruct highly degenerate repeats such as CR1_Mam, producing a model that achieves similar coverage to the reference Dfam model while extending it by an additional 131 bp that were not captured in the reference model. Overall, RRE enables the automatic improvement of de novo repeat libraries and the reconstruction of highly degenerate and fragmented repeats.

2
ATHILAfinder: a tool to detect ATHILA LTR retrotransposons in plant genomes

Bousios, A.; Primetis, E.

2026-03-22 bioinformatics 10.64898/2026.03.20.713144 medRxiv
Top 0.1%
22.8%
Show abstract

MotivationThe ATHILA lineage of LTR retrotransposons has colonised all branches of the plant tree of life. In Arabidopsis thaliana and A. lyrata, ATHILA elements have invaded centromeres, influencing the genetic and epigenetic organisation, and driving satellite evolution. To assess the broader significance of ATHILA across plants, a computational pipeline is needed to identify ATHILA elements with high efficiency. Existing tools lack this ability because they are optimised for broad transposon classification at the expense of precise annotation of lower taxonomic levels. ResultsWe present ATHILAfinder, a pipeline for accurate and large-scale discovery of ATHILA elements. ATHILAfinder uses lineage-specific sequence motifs as seeds and additional filters to build de novo intact elements. Homology-based steps rescue intact ATHILA and identify soloLTRs. A detailed identity card includes coordinates, LTR identity, coding capacity, length and other sequence features for every ATHILA. We validate ATHILAfinder in the A. thaliana Col-CEN assembly and five additional Brassicaceae species, covering four supertribes and [~]30 million years of evolution. ATHILAfinder has very low false positive rates and outperforms widely-used tools like EDTA and the deep-learning-based Inpactor2 software for both recovery and precision of ATHILA. To demonstrate its usefulness, we generate insights into ATHILA dynamics across Brassicaceae. OutlookFew computational pipelines target specific transposon lineages, yet such tools can empower their identification and downstream analyses. Our tailored approach can be adapted to other LTR retrotransposon lineages, offering new ways for high-resolution analysis of transposons.

3
Biological implications of a detailed repeat annotation in Octopus vulgaris

Bonar, M.; Elliot, T. A.; Ahmadi, M. A.; Cottenie, K.; Linquist, S.

2026-03-05 genomics 10.64898/2026.03.03.709284 medRxiv
Top 0.1%
22.2%
Show abstract

Octopuses are phenotypically distinctive organisms, and recent genomic work raises questions about the contributions of transposable elements (TE) to their genomic architecture. We leveraged a robust repeat annotation pipeline, in combination with manual and automated curatorial techniques, to produce a more comprehensive repeat annotation of Octopus vulgaris. This revealed that [~]66% of the genome are repeats, in contrast to previous estimates of 43-50%. Whereas previous studies of TE expansion in Octopus bimaculoides identified two bursts of activity, 25 and 56 MYA, our re-annotation revealed four such expansions at 18, 25, 33, and 56 MYA. We further identified a landscape of TE hot- and cold spots. This much refined TE timescape and landscape will serve as a useful basis for understanding TE contributions to O. vulgaris evolution, and also for identifying factors contributing to variation in the TE community across genomic space and evolutionary time.

4
Recurrent LINE 1 exonization drives transcriptome remodelling in NSCLC

Parida, A. S.; Kumar, A.; Tiwari, B.

2026-04-24 genomics 10.64898/2026.04.22.720055 medRxiv
Top 0.1%
14.4%
Show abstract

The only autonomously active transposable elements in the human genome are Long interspersed nuclear element-1 (LINE-1) elements. These elements are known to play an important role in changing the transcriptome. LINE-1 sequences affect gene regulation during post-transcription processing, along with their established role in retrotransposition. Exonization is one mechanism where the LINE-1 integrated genome undergoes alternative splicing to produce new isoforms of transcripts. Our work mainly highlights the effect of LINE-1 associated exonization, focusing on the formation of isoforms of transcripts. Using Non-small cell lung cancer (NSCLC) as a model, we conducted a detailed transcriptome study that combines splice junction profiling with gene expression data. Our results show that LINE-1 sequences are often included as exons in host transcripts, leading to the formation of new exons and their various isoforms. The events are validated by solid splice junction evidence that proves the reliability and reproducibility. In particular, it was observed that repetitive analyses revealed certain LINE-1 exonization events that were consistent. The finding indicates that LINE-1 act as recurrent sources of splice ready sequences. Though exonizations do not necessarily affect the total expression levels of genes, our study reveals that they certainly contribute to transcript diversity. The diversity of isoforms generated potentially contributes to the effects of gene function. This study is limited to NSCLC, but it is likely that the exonizations events play a crucial role in the altering RNA diversity in cancers. Therefore the study elucidates new insights into how transposable elements modify gene structure and function during cancer development.

5
TEDEdb: a large-scale resource and multi-cohort analysis of transposable element differential expression in cancer

Calendo, G.; Chaunzwa, M.; Dehzangi, I.; Madzo, J.; Issa, J.-P. J.

2026-04-29 cancer biology 10.64898/2026.04.27.721042 medRxiv
Top 0.1%
14.3%
Show abstract

The human genome consists of nearly 50% repetitive DNA, referred to for decades as "junk DNA". These repetitive sequences, usually under the strict control of epigenetic silencing, have been observed to be aberrantly expressed in cancer. Some of these expressed sequences, e.g., transposable elements (TEs), can induce innate immune responses when de-repressed following treatment with epigenetic therapies. As a result, epigenetic therapy has been suggested to augment cancer therapies. TEs are traditionally ignored in most RNA-seq studies and their expression is often excluded from publicly available data sources. Thus, the vast amount of publicly available RNA-seq data is an untapped resource for exploring the role of TE expression in cancer and cancer treatment. Here, we present a uniform re-analysis of over 7,000 RNA-seq samples, encompassing more than 2,000 differential expression experiments across 220 cancer cell lines and 700 drug treatments. We observed that TE expression is more prone to batch effects than gene expression alone, necessitating the use of meta-analysis techniques to probe the dataset for global trends. We confirm that DNMTi and HDACis are powerful inducers of TEs. We also show that non-epigenetic compounds such as CDK and topoisomerase inhibitors can also induce robust up-regulation of transposable elements and confirm that this TE induction is consistent with viral mimicry response. We make all of the reprocessed data, web application, and database publicly available at: https://dataexplorer.coriell.org/TEDEdb/

6
Evaluating the reliability of tools for mRNA annotation and IRES studies

May, G. E.; Akirtava, C.; McManus, J.

2026-03-31 genomics 10.64898/2026.03.29.707813 medRxiv
Top 0.1%
6.4%
Show abstract

Since the discovery of viral Internal Ribosome Entry Sites (IRESes), researchers have sought to find similar elements in mammalian host genes, termed "cellular IRESes". However, the plasmid systems used to measure cellular IRES activity are vulnerable to false positives due to promoter activity in candidate IRESes. Orthogonal methods are needed to validate putative IRESes while carefully avoiding artifacts known to cause false positives. Recently, Koch et al. proposed approaches for studying IRESes, primarily circular RNA-generating plasmids, and for validating mRNA transcripts using smFISH and qRT-PCR. Here, we demonstrate confounding variables and artifacts in each of these approaches that can lead to inappropriate conclusions about potential cellular IRES activity. We show the back-splicing circRNA plasmid creates linear mRNA artifacts associated with false-positive IRES signals. Using orthogonal, gold-standard assays validated with viral IRESes, we find putative cellular IRESes reported using the back-splicing plasmid have no IRES activity. Furthermore, we demonstrate that smFISH and qRT-PCR can misidentify nuclear non-coding RNAs as mRNAs and we validate a single molecule sequencing assay for identifying genuine mRNA 5 ends. Our work establishes reliable methods for robust transcript annotation and IRES studies that avoid documented artifacts arising from bicistronic and back-splicing circRNA plasmid reporters.

7
Conservation of Long G4-rich (LG4) genomic enhancer regulations

Shaw, M. H.; DeMeis, J. D.; Arnold, C. A.; Cox, M. R.; Duong, T. C.; Gaviria, K. A.; McDavid, G. K.; Villegas, J. M.; Weimer, M. L.; Patil, S. S.; Alqudah, S. Y.; Borchert, G. M.

2026-03-13 genomics 10.64898/2026.03.11.711068 medRxiv
Top 0.1%
6.1%
Show abstract

Long G4-rich regions (LG4s) are defined as DNA sequences containing a high density of guanine triplets capable of forming non-B DNA structures called G-quadruplexes (G4s). These regions frequently overlap with enhancers, which are regulatory DNA elements that modulate gene expression by interacting with DNA regions that dictate where transcription is initiated known as promoters. While LG4s have now been well-characterized in the human genome, neither LG4 occurrence, nor the ability of LG4s to function as enhancers, in other species has been described. To address this, we screened the genomes of 16 different species from various taxa to identify LG4s and then determined if they were conserved, and if so, if their regulatory capacity was similarly conserved. Our analyses characterized a number of previously unreported LG4s in the human genome as well as LG4s in 13 additional species. Of note, we identified a highly conserved LG4 enhancer predicted to regulate over 40 genes. This LG4 is embedded in the MAZ (Myc-Associated Zinc finger protein) locus, and we find this LG4 possesses the ability to directly interact with the same target promoter in both human and mouse. In summary, this work describes LG4s in the genomes of both unicellular and multicellular species including vertebrates, invertebrates, plants, and fungi. Furthermore, many of these LG4 sequences are highly conserved as is their regulatory capacity.

8
Lifestyles of Gypsy-family transposons shape their regulatory mechanisms

Papameletiou, A.-M.; Czech Nicholson, B.; Bornelöv, S.; Hannon, G. J.

2026-05-21 genomics 10.64898/2026.05.19.726053 medRxiv
Top 0.1%
4.3%
Show abstract

Transposable elements are a highly diverse group of selfish genomic elements, prevalent across the tree of life, whose uncontrolled propagation poses a threat to genome stability. Recent studies have explored the evolution of Drosophila melanogaster transposable elements, their co-evolution with the host genome, and mechanisms that regulate their activity. However, little is known about their cross-species evolutionary patterns. Long terminal repeat (LTR) retrotransposons are the most active group of transposable elements in Drosophila. They are broadly separated into retroelements, which are active in the germline, and insect endogenous retroviruses that are active in the soma. Somatic elements are hypothesised to infect the germline through their acquisition of virus-derived proteins such as Envelope and sORF2, thus multiplying through successive generations. In this study, we curated the sequences of LTR retrotransposons in 249 drosophilid genomes, allowing us to study their evolution across these species and highlight their varying degrees of conservation. Furthermore, we reveal multiple instances of Envelope protein loss or inactivation that suggest shifts in the expression pattern of these transposons, likely accompanied by adopting different transcriptional control mechanisms. We contrast this with the evolutionary history of sORF2, which we found to be much more stable. Lastly, we examined variations in transposon LTR regions responsible for transcriptional regulation and use predictive modelling to suggest six transcription factors likely involved in their tissue-specific expression. Altogether, we reveal complex, interspecies evolutionary patterns of Gypsy-family LTR retrotransposons and highlight examples of their co-evolution with their host genome.

9
lncOriL, a novel polyadenylated mitochondrial lncRNA common to zebrafish and human

Jorgensen, T. E.; Wardale, A.; Wolf Profant, S.; Amundsen, C.; Emblem, A.; Joakimsen, I. S.; Brekke, O.-L.; Karlsen, B. O.; Babiak, I.; Johansen, S. D.

2026-03-27 molecular biology 10.64898/2026.03.26.714394 medRxiv
Top 0.1%
4.0%
Show abstract

Even though teleost fish and mammals share the same mitochondrial gene content and organization, the teleost mitochondrial transcriptome is still poorly understood. We characterized the mitochondrial transcriptome during zebrafish (Danio rerio) early development by long-read direct RNA sequencing. All heavy-strand specific mRNAs were found to carry 3 poly-A tails of approximately 50-60 residues, and the transcriptome profile was distinctive but practically invariant between stages. Three unusual transcripts were however noted. These included two mRNAs (COI and ND5 mRNAs), with significant 3 untranslated regions corresponding to antisense gene sequences, and a previously not described noncoding RNA named here lncOriL. The ND5 mRNA was found to carry one third of all detected m6A methylation sites in the zebrafish mitochondrial transcriptome. The 313 nt-long lncOriL transcript had an abundance comparable to that of ND5 mRNA and it mapped to mitochondrial genome region covering the origin of light strand replication and four flanking antisense tRNAs. A mitochondrial tRNA-derived fragment (tiRNA5-Asn), with a 35 nt perfect pairing-potential to lncOriL, was present at all stages. Additional analyses including adult zebrafish, scissortail (Rasbora rasbora), and monkfish (Lophius piscatorius) strongly corroborate the results of COI mRNA, ND5 mRNA, and lncOriL transcript prevalence among teleost fish. Surprisingly, our findings in zebrafish were further supported by mitochondrial transcriptome analyses in domestic pig (Sus scrofa) and human (Homo sapiens), including tiRNA5-Asn commonly present in human tissues, suggesting that lncOriL is ubiquitously expressed and regulated in vertebrates. Author SummaryMitochondria contain their own genome and produce essential RNAs needed for energy production. Although fish and mammals share the same mitochondrial gene organization, less is known about how mitochondrial RNAs are processed and regulated in teleost. Using Nanopore direct RNA sequencing, we examined mitochondrial RNAs during early zebrafish development and discovered three unusual transcripts that include extended non-coding regions. Two of these molecules, COI and ND5 mRNAs, carry long 3' untranslated regions formed by antisense gene sequences, suggesting previously unrecognized regulatory potential. We also identified lncOriL, a highly structured long noncoding RNA that spans the origin of light-strand replication and is abundant during development. Strikingly, the same RNA feature, including lncOriL and a matching tRNA-derived small RNA (tiRNA5-Asn), was found not only in zebrafish but also in human mitochondrial transcriptomes. These findings support conservation of regulatory mitochondrial RNAs across main groups of vertebrate species. Our work reveals a new layer of mitochondrial RNA regulation and expands the current understanding of how mitochondrial gene expression is controlled.

10
Comparative analysis of transposable elements in jellyfish and hydroid species (Cnidaria: Medusozoa)

Mays, A.; Cabrera, F.; Macias-Munoz, A.

2026-04-21 evolutionary biology 10.64898/2026.04.17.719288 medRxiv
Top 0.1%
4.0%
Show abstract

BackgroundTransposable elements (TEs) are repetitive genetic elements that can jump to new loci causing genome expansions, structural rearrangements, and can, ultimately, propel the evolution of genomes. Despite their significance, the role of TEs in the evolution of genomes and phylogenetic groups remains largely understudied in early diverging lineages. Further, the extent to which TE content varies across species is still an open question. Medusozoa, a group within Cnidaria encompassing jellyfish and hydroids, exhibits an exceptional diversity of life history strategies, body plans, and physiological capabilities. These characteristics, along with its early-diverging phylogenetic position, establish Medusozoa as an ideal system for investigating the composition and evolutionary history of TEs within the group. ResultsWe generated a custom repeat library built from annotations of 25 Medusozoan genomes and used it to characterize TEs, aiming to identify lineage-specific TE content and activity that may correlate with the diversity observed within the group. We found that repetitive element percentage and genome size varied considerably, with Hydrozoa exhibiting the most variation among classes in both respects. DNA transposons were the most prevalent TE classification in all but two genomes, averaging 28% of all genomes. Intra-genus comparisons revealed a surprising degree of differences in TE content. In the genus Aurelia, the expansion of a single DNA transposon superfamily accounted for much of the difference in repetitive element percentage between two species, whereas in the genus Turritopsis, a similar divergence resulted from the proliferation of multiple superfamilies. Interestingly, most genomes showed evidence of recent TE expansions, suggesting ongoing activity in many medusozoan species. ConclusionWe present the first comparative analysis of TEs across all medusozoan classes. Our results reveal class-specific TE dynamics and highlight cases of TE proliferations as lineages diverge. This research provides data on TE activity and diversity that can be used as a resource for future study and fills important gaps in our understanding of TEs in early diverging animal lineages.

11
A long-read RNA sequencing and polysome profiling framework reveals transposable element-driven transcript diversity and translational rewiring in glioblastoma

Pizzagalli, M.; Sasipalli, S.; Leary, O.; Tran, L.; Haas, B.; Tapinos, N.

2026-04-21 cancer biology 10.64898/2026.04.18.719388 medRxiv
Top 0.1%
3.9%
Show abstract

BackgroundTransposable elements (TEs) account for over half of the human genome and are often derepressed in cancer. TEs can add cryptic splice sites, undergo exonization, and generate gene-TE fusion transcripts, but the combined effects of TEs on RNA processing and translation in glioblastoma stem cells (GSCs) remains incompletely elucidated. ResultsWe combined long-read RNA sequencing with polysome profiling in four patient-derived GSCs and two neural stem cell (NSC) controls to resolve TE-associated transcript diversity and its relationship to ribosomal engagement. Across GSCs, we identified 13,421 alternative splicing (AS) events, 3,077 of which contained TEs within 150 bp of splice junctions. AS sites proximal to TEs were associated with increased isoform switching compared to non-TE-associated AS sites (odds ratio 2.9 - 4.3). Moreover, AS isoforms generated from TE-proximal sites were more likely to exhibit altered ribosomal association (odds ratio 2.54). Directional shifts were observed, with shorter isoforms associating with monosome fractions and longer isoforms with polysome fractions. To enable systematic detection of gene - TE chimeric transcripts, we developed FuTER (Fusion TE Reporter), a long-read-based framework for identifying TE-associated fusions. Application to GSC datasets identified 78 GSC enriched fusion transcripts, several supported by breakpoint-spanning reads in polysome fractions, consistent with ribosome association. ConclusionsOur data suggest that TEs correlate with abnormal splicing activity and altered ribosome engagement in glioblastoma stem cells. By integrating long-read sequencing with polysome profiling and fusion detection, we establish a framework for analysis of TE-induced transcript diversity and its effects on cancer evolution and plasticity.

12
MAJEC: unified gene, isoform, and locus-level transposable element quantification from RNA-seq

Lim, T.-Y.; Firestone, A. J.

2026-04-14 bioinformatics 10.64898/2026.04.10.717472 medRxiv
Top 0.1%
2.7%
Show abstract

BackgroundThe study of transposable elements (TEs) has become increasingly central to fields such as cancer biology, immunology, and aging. Accurately quantifying disease- or laboratory-mediated perturbations in these elements is critical to support this expanding research, yet current RNA-seq pipelines struggle with the pervasive overlap between TEs and protein-coding genes. Existing tools either aggregate to the subfamily level with no locus resolution (TEtranscripts), or provide locus-level quantification without modeling gene overlap (Telescope), with the latter attributing over 40% of TE signal to the 1.1% of loci that overlap gene exons. ResultsWe present MAJEC (Momentum Accelerated Junction Enhanced Counting), a unified Expectation-Maximization (EM) framework that jointly quantifies genes, transcript isoforms, and individual TE loci from BAM alignments in a single pass. Splice junction evidence informs transcript-level priors, enabling MAJEC to probabilistically distinguish genic from TE-derived reads. This approach was independently validated against Salmon and RSEM on isoform quantification benchmarks. The joint feature space reduces exon-overlap contamination of locus-level TE estimates from 43% of total signal (Telescope) to 5% (MAJEC), while preserving subfamily-level accuracy (differential expression r = 0.987 vs TEtranscripts). Using paired biological vignettes, we demonstrate that MAJEC correctly resolves both the false TE reactivation artifacts endemic to TE-only models, and the false gene upregulation artifacts that occur when heuristic rules misassign genuine intragenic TE transcription. ConclusionMAJEC simultaneously produces the isoform and locus-level resolution that TEtranscripts lacks, with greater accuracy than Telescope, and runs faster than either.

13
Knob K180 Constitutive Heterochromatin Of Maize Exhibit Tissue-Specific Chromatin Senstitive Profiles Distinct From Other Types Of Heterochromatins

Sattler, M. C.; Singh, A.; Bass, H. W.; Mondin, M.

2026-04-04 genetics 10.64898/2026.04.01.715864 medRxiv
Top 0.1%
2.6%
Show abstract

BackgroundMaize knobs are regions of constitutive heterochromatin that are readily identified in both meiotic and somatic chromosomes. These structures have been characterized as stable throughout the cell cycle, exhibiting late replication during the S-phase, and are composed of two specific families of highly repetitive DNA sequences: K180 and TR-1. Although widely used as cytogenetic markers due to their variability in number and chromosomal position across inbred lines, hybrids, and landraces, little is known about their chromatin structure and dynamics. In this study, we analyzed chromatin accessibility of knobs using DNS-seq data across four maize tissues representing distinct developmental stages. ResultsOur results reveal that K180 knobs exhibit tissue-specific variation in chromatin accessibility, transitioning between open and closed states during development. In contrast, the TR-1 knob of chromosome 4 remained consistently inaccessible across all tissues analyzed. A knob composed of both K180, and TR-1 further supported this observation, with only the K180 region showing dynamic accessibility. To validate these findings, we also analyzed other repetitive regions such as centromeres, which showed a uniformly closed chromatin structure similar to TR-1. These results suggest a unique developmental modulation of chromatin accessibility associated with K180 repeats. While the chromatin accessibility of knobs does not reach the levels observed at Transcription Start Sites (TSS), the comparison among different classes of repetitive DNA within maize constitutive heterochromatin provides compelling evidence for sequence-specific and tissue-specific chromatin dynamics. ConclusionsOur findings uncover a previously unrecognized property of maize knobs and establish a reference for future studies on chromatin organization and epigenetic regulation of repetitive DNA in plant genomes.

14
TEsingle enables locus-specific transposable element expression analysis at single-cell resolution

Forcier, T.; Cheng, E.; Tam, O. H.; Wunderlich, C.; Castilla-Vallmanya, L.; Jones, J. L.; Quaegebeur, A.; Barker, R. A.; Jakobsson, J.; Gale Hammell, M.

2026-03-22 genomics 10.64898/2026.03.19.712984 medRxiv
Top 0.1%
2.3%
Show abstract

Transposable elements (TEs) are mobile genetic sequences that can generate new copies of themselves via insertional mutations. These viral-like sequences comprise nearly half the human genome and are present in most genome wide sequencing assays. While only a small fraction of genomic TEs have retained their ability to transpose, TE sequences are often transcribed from their own promoters or as part of larger gene transcripts. Accurately assessing TE expression from each individual genomic TE locus remains an open problem in the field, due to the highly repetitive nature of these multi-copy sequences. These issues are compounded in single-cell and single-nucleus transcriptome experiments, where additional complications arise due to sparse read coverage and unprocessed mRNA introns. Here we present our tool for single-cell TE and gene expression analysis, TEsingle. Using synthetic datasets, we show the problems that arise when not properly accounting for intron retention events, failing to address uncertainty in alignment scoring, and failing to make use of unique molecular identifiers for transcript resolution. Addressing these challenges has enabled an accurate TE analysis suite that simultaneously tracks gene expression as well as locus-specific resolution of expressed TEs. We showcase the performance of TEsingle using single-nucleus profiles from substantia nigra (SN) tissues of Parkinsons Disease (PD) patients. We find examples of young and intact TEs that mark dopaminergic neurons (DA) as well as many young TEs from the LINE and ERV families that are elevated in PD neurons and glia. These results demonstrate that TE expression is highly cell-type and cellular-state specific and elevated in particular subsets of neurons, astrocytes, and microglia from PD patients.

15
Molecular and functional characterization of telomeric repeat-containing RNAs in Chinese hamster ovary cells

Domingues-Silva, B.; Azzalin, C. M.

2026-04-02 cell biology 10.64898/2026.04.01.715793 medRxiv
Top 0.1%
2.1%
Show abstract

Mammalian telomeric DNA comprises long tracts of tandem TTAGGG repeats. The same repeats are also found at internal chromosomal regions called interstitial telomeric sequences (ITSs). Telomeres are transcribed into UUAGGG-containing transcripts, named TERRA, which serve multiple functions in maintaining telomere integrity. Complementary RNAs containing C-rich telomeric repeats, named ARIA, have also been identified in few yeast mutants and mammalian cells with dysfunctional telomeres. The molecular features and functions of ARIA remain understudied, mainly due to its low abundance and the lack of suitable cellular systems. Here, we show that Chinese hamster ovary (CHO) cells produce abundant TERRA and ARIA transcripts, predominantly originating from ITSs. Both RNAs are polyadenylated, exhibit relatively short half-lives and form large cellular foci. We also show that ARIA depletion leads to exposure of single-stranded (ss) DNA at ITSs and that ssDNA exposure increases when ITS DNA is damaged. SsDNA formation does not require the DNA damage signaling kinases ATM and ATR, nor the exonucleases DNA2 and EXO1; however, ATM prevents excessive ssDNA accumulation when ARIA function is inhibited. These findings establish CHO cells as a powerful model to dissect telomeric RNA functions and reveal ARIA as a key regulator of telomeric repeat DNA integrity.

16
Repetitive DNA shapes genome architecture and chromosomal diversification in birds of prey

Souza, G. M.; Vidal, J. A. D.; Toma, G. A.; Kretschmer, R.; DE OLIVEIRA, E. H. C.; Liehr, T.; Cioffi, M. d. B.

2026-03-05 genetics 10.64898/2026.03.03.709396 medRxiv
Top 0.1%
2.0%
Show abstract

The evolution of genome architecture occurs through dynamic interactions between repetitive DNAs and chromosomal organization; nevertheless, the processes underlying these mechanisms are not well understood. This study presents a comprehensive genomic and cytogenetic analysis of repetitive DNA evolution across Accipitridae birds, a raptor family notable for its significant chromosomal variation. We aimed to investigate how repetitive DNAs have evolved across Accipitriform lineages and test whether shifts in repeat composition are associated with patterns of species diversification. Comparative investigations of eight genomes reveal lineage-specific spikes of transposable elements and satellite DNAs that substantially modify genome composition while preserving a common structural framework. Temporal insertion profiles indicate that repeat turnover is ongoing and frequently coincides with lineages exhibiting extensive chromosomal reorganization. By integrating comparative repeatome analyses with in silico and cytogenetic mapping, we elucidate the spatial architecture governing repeat dynamics, connecting molecular turnover to their chromosomal structure. These findings underscore the effectiveness of merging genomic and chromosomal data to elucidate the impact of repeat landscapes on chromosomal and genomic evolution.

17
Cp36 serine recombinase as a new tool for zebrafish transgenesis

Thrikawala, S.; Naples, B.; Rosowski, E.

2026-05-11 genetics 10.64898/2026.05.06.723361 medRxiv
Top 0.1%
1.7%
Show abstract

One feature key to the versatility of zebrafish as an animal model for biomedical research is the breadth of genetic tools available, including for transgenesis. While the Tol2 transposase system remains the gold standard, its efficiency can be highly variable. Here, we explored the potential of a complementary transgenesis system, Cp36, a large serine recombinase identified from Clostridium perfringens previously found to efficiently integrate target cargo into the human genome without a preinstalled attB site. We generated Cp36-based plasmid constructs for zebrafish transgenesis and compared their performance to matched Tol2 plasmids across multiple experimental contexts, including transient expression, germline transmission, and multi-transgene expression. Cp36 integrates small [~]3.5kb cargo into the zebrafish genome and transmits to the next generation as efficiently as Tol2, but Cp36 performance declines substantially for larger [~]7.5kb constructs. Both Cp36 and Tol2 have comparable efficiency in transiently expressing a second construct regardless of the transposase/recombinase used to integrate the first construct, indicating compatibility with sequential transgenesis strategies. In summary, we demonstrate that Cp36 functions as a new alternative transgenesis method in zebrafish.

18
Short Interrupted Repeats Cassette (SIRC) ensembles of plant genomes reflects evolutionary route

Gorbenko, I. V.; Scherbakov, D. Y.; Zverintseva, K. M.; Konstantinov, Y. M.

2026-03-30 plant biology 10.64898/2026.03.27.714674 medRxiv
Top 0.1%
1.7%
Show abstract

Short Interrupted Repeats Cassettes (SIRC) are recently discovered eukaryotic DNA elements possessing many traits of satellite DNA and mobile genetic elements, and consisted of short direct repeats interspersed with diverse spacer sequences. The SIRC ensemble of individual species is highly heterogenous and cannot be studied using alignment methods. It was found that number of similar SIRC sequences in a given pair of species is in general correlated with their taxonomic distance, and, at the same time, closely related species can possess very diverged SIRC ensembles, which makes SIRC evolutionary pattern closer to mobile genetic element type. The SIRC sequences make up clusters with comparable sequence patterns, that are likely to demonstrate doublet evolutionary model which strongly supports that the SIRC structure is supported by the evolutionary selection. Several SIRC sequences of Arabidopsis were found to be of ancient origin with traceable evolution history as far as to the moss clade. We carried out unbiased detection of SIRC ensembles in 10 plant genomes and found that, despite very high intraspecies heterogeneity, SIRC sets possess strong interspecies phylogenetic signal. Key messageShort Interrupted Repeats Cassettes are elements of ancient origin, and could potentially be used to trace organism history, and to facilitate syntheny and Hi-C analysis.

19
Identification of cellular double-stranded RNAs in mammalian embryonic stem cells

Kases, K. J.; Marchante, P. G.; Witteveldt, J.; Peris, G.; Heras, S. R.; Macias, S.

2026-04-15 molecular biology 10.64898/2026.04.13.718158 medRxiv
Top 0.1%
1.7%
Show abstract

Antiviral defence mechanisms are typically activated upon sensing virus-derived nucleic acids. During replication, viruses generate double-stranded RNA (dsRNA) intermediates that the innate immune system can sense, triggering several defence pathways. Conversely, mammalian cells avoid accumulating their own endogenous dsRNA to prevent activating these defence mechanisms. However, we demonstrate that mammalian embryonic stem cells (ESCs) accumulate endogenous dsRNA without activating these responses, as they lack all classical dsRNA-mediated antiviral pathways. To identify these endogenous dsRNAs, we have developed a method that includes an antibody-based purification coupled to RNase I treatment to enrich bona fide dsRNAs. The RNase I treatment results in an enrichment on sense/antisense and A-to-I edited transcripts, suggesting that they are true dsRNA in cells. Our refined protocol reveals that transposable elements (TEs), primarily the young elements from the LINE and LTR classes, are the predominant sources of dsRNA in ESCs. This approach will be useful for investigating the role of dsRNA in disease settings, such as autoimmunity or cancer, where endogenous dsRNA accumulation has also been observed.

20
Distribution of prophage-encoded Pas sRNAs across pathogenic Escherichia coli

Zhu, D. X.; Shabalina, S. A.; Storz, G.

2026-05-14 evolutionary biology 10.64898/2026.05.13.724894 medRxiv
Top 0.1%
1.6%
Show abstract

Numerous base pairing small RNAs (sRNAs), which are an integral part of regulatory networks in bacteria, are encoded on mobile genetic elements (MGEs) in pathogenic strains of Escherichia coli. These sRNAs help coordinate the expression of MGE-encoded virulence factors with core genome-encoded cellular pathways. To investigate the evolution of MGE-encoded sRNAs, we queried public databases to characterize the distribution of PasA, PasB, PasC, PasD1, and PasD2, five prophage-encoded sRNAs discovered in enteropathogenic E. coli. We find that while the Pas sRNAs are largely restricted to pathogenic lineages of Escherichia and Shigella, they exhibit diversity in sequence, genomic presence, and copy number across strains. Based on phylogenetic analysis, the Pas sRNAs originate from multiple ancestral lineages and associate with specific E. coli pathovars, consistent with horizontal acquisition followed by retention. Syntenic analysis suggests a phage origin for the Pas sRNAs, likely from Shiga-toxin encoding phages, but the sRNAs appear to have diverged substantially following their integration into bacterial chromosomes. Comparative and structural analyses further suggest that the PasA and PasC sRNAs share a common ancestor as is the case for PasD and STnc100, another prophage-encoded sRNA. These findings add to our understanding of how accessory genome-encoded sRNAs emerge and evolve.